Picture for Mark Dredze

Mark Dredze

FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights

Add code
Feb 02, 2026
Viaarxiv icon

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

Add code
Jan 12, 2026
Viaarxiv icon

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

Add code
Jan 10, 2026
Viaarxiv icon

Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

Add code
Nov 08, 2025
Figure 1 for Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Figure 2 for Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Figure 3 for Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Figure 4 for Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Viaarxiv icon

Evaluating the Evaluators: Are readability metrics good measures of readability?

Add code
Aug 26, 2025
Viaarxiv icon

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

Add code
Jun 06, 2025
Viaarxiv icon

Label-Guided In-Context Learning for Named Entity Recognition

Add code
May 29, 2025
Viaarxiv icon

MedScore: Factuality Evaluation of Free-Form Medical Answers

Add code
May 24, 2025
Viaarxiv icon

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

Add code
Apr 25, 2025
Viaarxiv icon

Understanding and Mitigating Risks of Generative AI in Financial Services

Add code
Apr 25, 2025
Viaarxiv icon